Exploiting Sentence Similarities for Better Alignments
نویسندگان
چکیده
We study the problem of jointly aligning sentence constituents and predicting their similarities. While extensive sentence similarity data exists, manually generating reference alignments and labeling the similarities of the aligned chunks is comparatively onerous. This prompts the natural question of whether we can exploit easy-to-create sentence level data to train better aligners. In this paper, we present a model that learns to jointly align constituents of two sentences and also predict their similarities. By taking advantage of both sentence and constituent level data, we show that our model achieves state-of-the-art performance at predicting alignments and constituent similarities.
منابع مشابه
Reliable Measures for Aligning Japanese-English News Articles and Sentences
We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignm...
متن کاملMUTT: Metric Unit TesTing for Language Generation Tasks
METEOR a metric that computes soft similarities between sentences by computing synonym and paraphrase scores between sentence alignments SICK+: Since SICK is for compositional semantics, all sentences have proper grammar. We automatically generated ungrammatical sentences (without human-estimated scores) to supplement the existing sentence pairs. Dataset Case Study: SICK: We examine how well hu...
متن کاملParallel Seed-Based Approach to Multiple Protein Structure Similarities Detection
Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a qua...
متن کاملDLS$@$CU: Sentence Similarity from Word Alignment and Semantic Vector Composition
We describe a set of top-performing systems at the SemEval 2015 English Semantic Textual Similarity (STS) task. Given two English sentences, each system outputs the degree of their semantic similarity. Our unsupervised system, which is based on word alignments across the two input sentences, ranked 5th among 73 submitted system runs with a mean correlation of 79.19% with human annotations. We a...
متن کاملCut the noise: Mutually reinforcing reordering and alignments for improved machine translation
Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine transla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016